Skip to content

BUG: Fix segfault in lib.isnullobj #13764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jul 23, 2016

Weird segfault arises when you call lib.isnullobj with an array that uses 0-field values to mean None.
Changed input to be a Python object (i.e. no typing), and the segfault went away. Discovered when there were segfaults in printing a DataFrame containing such an array.

Closes #13717.

@codecov-io
Copy link

codecov-io commented Jul 23, 2016

Current coverage is 84.56% (diff: 100%)

No coverage report found for master at b60e42b.

Powered by Codecov. Last update b60e42b...cd07769

@@ -342,7 +342,7 @@ def item_from_zerodim(object val):

@cython.wraparound(False)
@cython.boundscheck(False)
def isnullobj(ndarray[object] arr):
def isnullobj(arr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another routine isnull right below this

Copy link
Member Author

@gfyoung gfyoung Jul 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh...will fix too.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2016

you need to check if it's iterable now eg s scalar will fail

@jreback
Copy link
Contributor

jreback commented Jul 23, 2016

or at least assert that (as this cannot be called with s scalar)

tm.assert_numpy_array_equal(result, expected)

def test_non_ndarray_inp(self):
ind = pd.Index([1, None, 'foo', -5.1, pd.NaT, np.nan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put all of these tests with the null tests - you will have to find them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a very helpful comment tbh. This a lib method, so I don't see why we don't put the tests here and instead go "hunting" for unit tests like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you will now have tests in multiple places for the same thing

some lib tests see segrated

try types/missing

Copy link
Member Author

@gfyoung gfyoung Jul 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they're unit tests for two different methods, one of which is a helper for another. Sure they might be similar, but from an organisation perspective, it's easier I think to put the tests next to the module whose methods or functions you are testing. That makes it easier and write tests IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bottom line is to be logical where test go do they r easy to fine

my point is that tests for this should be in 1 place
u can pick but consolidate the like tests

or these functions not tested anywhere else??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific lib functions aren't being tested anywhere, so I intended to put unit tests for them in test_lib.py. The null functions you were referring to are those exposed at the Python layer, not the internal ones (which is what I am fixing) that are written in Cython.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

@jreback : good point about the scalar thing - I think I actually don't need to remove ndarray entirely. If I remove the object dtype specification, that might be sufficient. Testing it right now.

@gfyoung gfyoung force-pushed the isnullobj-segfault branch from eb1e8ee to cd07769 Compare July 23, 2016 11:46
@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

@jreback : "fixed" all four isnullobj methods and added tests for all of them, and Travis is still passing. Ready to merge if there are no other concerns.

@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions labels Jul 23, 2016
@jreback jreback added this to the 0.19.0 milestone Jul 23, 2016
@jreback
Copy link
Contributor

jreback commented Jul 23, 2016

lgtm. can you give a perf test and see if anything material (I don't think so, but can't hurt). as this is the null tester for strings.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

What do you mean null tester for strings? What method are you referring to?

@jreback
Copy link
Contributor

jreback commented Jul 23, 2016

this is called for null testing on strings

@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

From where exactly? Are you talking about the pandas.types.missing methods (also, why focus on strings specifically?)

@jreback
Copy link
Contributor

jreback commented Jul 23, 2016

look at how isnull works. this is the only 1 of 2 places this is called.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

Okay, that's what I thought. Where should this perf test go in asv_bench? In strings.py I presume?

@jreback
Copy link
Contributor

jreback commented Jul 23, 2016

oh, i mean there prob is a benchmark that tells null on strings already, if not see where the null tests are (I think they are all in one place).

@gfyoung gfyoung force-pushed the isnullobj-segfault branch from cd07769 to 1f90104 Compare July 23, 2016 21:30
@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

@jreback : I found isnull benchmarks in the DataFrame benchmarks, so I added one for strings there.

@gfyoung gfyoung force-pushed the isnullobj-segfault branch from 1f90104 to b10bbef Compare July 23, 2016 22:02
@gfyoung
Copy link
Member Author

gfyoung commented Jul 23, 2016

Travis is doing weird things. Putting [ci skip] in the commit until it corrects itself.

Weird segfault arises when you call lib.isnullobj
(or any of its equivalents like lib.isnullobj2d) with
an array that uses 0-field values to mean None. Changed
input to be a Python object (i.e. no typing), and the
segfault went away.

Closes pandas-devgh-13717.

[ci skip]
@gfyoung gfyoung force-pushed the isnullobj-segfault branch from b10bbef to 0338b5d Compare July 23, 2016 23:15
@jreback jreback closed this in ee6c0cd Jul 24, 2016
self.sample = np.array(list(string.ascii_lowercase) +
list(string.ascii_uppercase) +
list(string.whitespace))
self.data = np.random.choice(self.choice, (1000, 1000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls test, this was not building on asv (choice -> sample)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry! Was in a bit of a rush when I wrote it.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2016

thanks

minor change I made in the asv
also it IS slightly slower, so I guess the emitted c-code is slightly less optimal w/o the object typing (but can't have seg faults of course :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants